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This paper describes two generalization schemes of the Optimal Variables technique 
in estimating simultaneously two Trilinear Gauge Couplings. The first is an iterative 
procedure to perform a 2-dimensional fit using the linear terms of the expansion of 
the probability density function with respect to the corresponding couplings, whilst 
the second is a clustering method of probability distribution representation in five 
dimensions. The pair production of W's at 183 GeV center of mass energy, where 
one W decays leptonically and the other hadronically, was used to demonstrate the 
optimal properties of the proposed estimation techniques. 



G. 
N. 



K. Fanourakis, D. Fassouliotis, A. Leisos, 
Mastroyiannopoulos and S. E. Tzamarias 

Institute of Nuclear Physics - N.C.S.R. Demokritos 



15310 - Aghia Paraskevi - Attiki - Greece 



Abstract 



1 Introduction 



The precision in measuring physical parameters is strongly dependent on the incorporation 
of the detector resolution and efficiency into the statistical estimators. However when 
several kinematical variables are needed to describe the physics process, the convolution 
of the theoretical predictions with the detector effects is a difficult task. This is the case of 
the Trilinear Gauge Couplings (TGC's) estimation at LEPII from the pair production of 
W bosons, where one deals with an 8-dimensional phase space. In this case, the resolution 
function describing the measuring process is an 8x8 matrix with elements functions of the 
kinematical vector. There is no practical way to parameterize analytically such detector 
dependence unless an enormous amount of Monte Carlo (M.C.) events is available. An 
alternative procedure would be to project the probability distribution function to a subset 
of kinematical variables, thus decreasing the order of the resolution matrix, without losing 
in sensitivity. 

In a previous paper [jl| it has been shown that, in one TGC estimation, the probability 
distribution function (p.d.f.) P{V; a) can be projected on the two variables xi and X2 (the 
Optimal Variables) without any loss of information. Specifically, in a phenomenological 
scheme where only one coupling is free to deviate from the Standard Model value , the 
differential cross section with respect to the 8-dimensional kinematical vector V has a 
quadratic dependence on the free couphng a of the form: 

^ = Co{V) + c^{V) ■ a + c^iV) ■ a' (1) 
aV 

The p.d.f. P{V]a), which carries the whole information concerning the coupling a, is 
then defined as: 

(6o + oi ■ a + ^2 ■ a ) 
where the denominator is the total cross section, i.e. : 



S, = J c,{V) ■ dV (3) 
The projection of (||), on a plane defined by the two Optimal Variables Xi and 



contains all the information. The functional form of the Optimal Variables ( xi 



w{x,, X2; a)= f Piy- a) ■ 5{x, - ^) ■ 5ix2 - ^) ■ dV (4) 



c^(^' "^^ ~ c^(^'' independent of phase space or other multiplicative (e.g. Initial 
State Radiation) factors, containing only the coefficients of the polynomial realization of 
the squared Matrix element by folding the kinematical information corresponding to the 
hadronic jets. 

When detector effects are to be taken into account, the Optimal Variables are defined by 
the convolution of the differential cross section with the resolution and efficiency functions. 
However, it has been shown that their functional form can be approximated very 
precisely as: 
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,.^£l(S),.,^Hi(2) (5) 

where Q is the measured kinematical vector. 

A binned hkehhood fit, in bins of Xi,X2, was demonstrated to estimate the couphng 
with maximal accuracy even in cases of small statistical ( 172 LEPII Run samples. 

Despite the success of the Optimal Variable method in one parameter estimation, the 
same technique is not easily extended to multi parametric fits. As an example, in a TGC 
scheme where two couplings can deviate from their S.M. values, the p.d.f. is written as: 

P{V; au a^) ■ dV = ' . • dV (6) 

In this scheme five Optimal Variables are needed to contain the whole information, namely: 

coijV) ciojV) 

Xi = —,X2 = — 

cooiV) cooiV) 

X3 - ^ (7) 
CooiV) 

X4 = —, X5 = — 

cooiV) cooiV) 

Although there are other maximum likelihood equivalent strategies [0) 0) [0 which 
are reducing further the number of the necessary variables, it is interesting to see that 
unbiased and efficient binned likelihood fits can be made in many dimensions, as well. 

In this paper we propose two new techniques of performing two TGC simultaneous 
estimations based on evaluating the cross section of the process e~^e~ iujj in bins of 
the Optimal Variables. In both methods the M.C reweighting procedure is used to 
express the cross sections and the probabilities in every bin as functions of the two TGC 
couplings. The reweighted sample of M.C. events consisted of 60000 fully reconstructed 
four fermion events in iujj final states. A fraction about the 40% of them have been 
generated by PYTHIA (including only CC03 production processes), another 40% are 
generated by EXCALIBUR [^(including the full list of 4 fermion diagrams) at Standard 
Model (S.M.) coupling values. The remaining 20% of theses events are generated by 
EXCALIBUR at several anomalous coupling values. 

The simulation of the detector effects was performed by deploying the DELSIM [|l^ 



package whilst the event selection algorithms were the same as the ones described in 
||TT| and The effect of the background contamination to the data samples has been 
studied by producing M.C. sets corresponding to physical processes 0,0, with final 
state topologies accepted by the selection criteria of the genuine WW events. 

2 Iterative estimation with Optimal Variables 

The p.d.f. (H) can be written in a Taylor expansion around the parametric point {a^, a^} 
as: 
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P{V] ai,a2) 
where: 



yooiV; al a°) = co^{V) + c^i{V)a\ + cio(\7)a^ + C2o{V)af + co2{V)af + cn(\/)a?a° 
l/oi(f;«?,«°) = Coi(V) + 2co2(V^)«° + Cn(V^)a? (9) 

^i, = jyi,{V)dV 

and (5q,^, 5ci2 being the deviations from a° and respectively. For couphng values close 
to the expansion point, the p.d.f. is accurately approximated by keeping only the linear 
terms of In this approximation the p.d.f. is a function of the variables 

^.(i7;"!.a5) = ^'"ipi-ii (10) 



/t7 On ?/oi(^;a?,a' 
Z2{V]alal) = -^-r- 



yoo{V] a1,a^) 
rather than the kinematical vector V itself. 

By including the influence of the detector in the determination of the kinematical 
variables, the p.d.f. with respect to the measured kinematical vector Q should be expressed 
as: 



where e(K) is the differential efficiency, and -R(V^; fi) is the resolution function. By ex- 
panding (|TTp in a Taylor series around the {a^, oS^ parametric point, a similar expression 
as in is achieved. Namely: 

P{n,aua2) = , ^ -^)-da, + { , ^ -^)-Sa,\+0{6^^,6^^, 

^00 ?/oo(^^; "i, "2) '^00 yQQ{il; a'i, a'i) ^00 

(12) 

where the terms yiji^t; a?, a^) are convolutions of the functions yij{V] a?, a^), given in 
(0), with the detector functions i.e. 



yd^) = / y^Ay■, < a'MV)R{v, n)dv (i3) 



The Optimal Variables, ignoring higher orders in ([T^), are the ratios: 

^'.0^0. ^ IyloiV;a'^,a'2)eiV)R{V,n)dV 
JyooiV; al a^2W)R{V, Vt)dV 

^',0^0. ^ Syoi{V;alal)e{y)R{yAdV 
I yooiV; al al)e{y)R{y, Vl)dV 



(14) 
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As it has been shown in [^] the functional form of the Optimal Variables, including 
detector effects, can be approximated as: 



yio{n;al,a'^) ^^^^ 

2/oo(^;ai,a2) 
yoi(ll;ag,a^) 

2/oo(^;ai,a2) 

where the measured kinematical vector Q, instead of the real vector V, is used to define 
the expansion coefficients in eq(P). 

Based on this analysis, the simultaneous estimation of two couplings is realized by an 
iterative procedure consisted of the following steps: 



1. Define the functional form of the Optimal Variables around the expansion point 

and (0), by using the observed kinematical vectors as input to 

the ERATO [0] four-fermion matrix element package. 

2. Evaluate the differential cross sections (Aj(a;i, 02) i = 1, 2, . . . , fc) in k 2-dimensional 
bins of the Optimal Variables as functions of the ai and 02 TGC values by means 
of a reweighted Monte Carlo integration. 

3. Estimate the couplings values by maximizing an extended likelihood function, thus 
taking into account the total number of the observed events. In order to include 
inaccuracies due to the M.C evaluation of the cross sections, the extended likelihood 
function is written as: 



{a]^, 0^2}, as in 



4 = 1 

where 



L^{ai,a2]ai,al) = \{ — — j 7^—, zd^^^ (16) 

-^^ ' nil V27rcri(ai,a2) 



5i{ai,a2) = Aj(ai,a2) ■ is the number of the expected events in the bin i for 
couphng values ai and 02 and integrated luminosity C. 

(Ti{ai, 0^2) is the estimated error in the determination of 02) 

rii is the number of the observed events in the bin i. 

4. The likelihood estimations of the couplings, cfi and 02, are used as a new expansion 
point at the step 1 and the whole procedure is repeated. 

The iteration method is considered to converge when the estimated values of the 
couplings are equal to those which have been used as expansion values. 

The converging properties of the proposed technique are demonstrated by a simultane- 
ous estimation of the ^gl and couplings using M.C. generated events as data samples. 
These are three sets of M.C. events produced by the EXCALIBUR four fermion generator 

^ 6000 jjbvjj events produced at = 0, Agj" = 0), 1000 ^ivjj events produced at (A^ = 1 ^/S.gl = 0), 
and 1000 ^ivjj events produced at (A-y = — 1, A^j = 0) 
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at different points of the parametric space undergone full detector simulation, and have 
been reconstructed and selected in the same way as the real data . In figures 0a and |l|c 



the deviations of the two couplings estimated values (di = A^,a2 = ^^i) from the corre- 
sponding expansion values (a? = A°, = ) shown, for several expansion points. 
These are the estimated couplings by applying the proposed technique to the M.C. set of 
events which has been produced with S.M. values. The intersections of these deviation 
surfaces with the plane of zero deviation, corresponding to the fits where the estimated 
values of each individual coupling are equal to the expansion point, are shown in figure |I|b 
and |l|d. The geometry of these intersection lines is such that there is only one parametric 
point at which the expansion and estimated values are equal for both couplings. This 
point of convergence (indicated on both the intersections as a star) is very close to the 
S.M. couplings used for the generation of the data samples. In figure |^, similar deviation 
surfaces are shown, corresponding to the sets produced with = 1, Ag^ = (a,b,c,d) 
and A^ = —1, Agf = (e,f,g,h) values. The convergence points in these last examples are 
also consistently matching the true coupling values. 



3 Multidimensional fits with the Clustering tech- 
nique 

The general expression of the p.d.f. (|lT]) depends on the measured kinematical vector Q 
through the five Optimal Variables: 

jc„{V)-((v)R(,v-,n)dv 

That is the projection 
n(/?i, i?2, . . . , i?5; «i, «2) = / Pi^; «!, a2)'5(i?i-^oi)-'5(i?2-^io)-. ■ .■5iR,-uJn)-dn (18) 

carries the whole information concerning the couplings ai and a2- Furthermore by writing 
(Q)as: 

J CooiV) jEo^.e{V).R{V;n)dV 

we could repeat the same arguments as in |1|] to approximate the functional form of the 
Optimal Variables as: 

^.,(fi) ^ ^ (20) 

Coo(ii) 

by using the observed kinematical vector as input to ERATO four-fermion matrix element 
package ^ . In figure ^ this approximation ( |20| ) is tested [| by plotting the mean values 

of the quantities ^'-^ for events produced with coupling values equal to zero and been 
observed in a bin of versus the approximated expression of the Optimal Observables 

^The expression (fol) defines u!ij((}) as the mean value of the function ^'^^^'^j where the vector V follows 

' — ^ ■' coo{V) 

the p.d.f. '^°g^J^^ , has been selected and has been reconstructed in the phase space interval (l ■ dD, 
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■^^^S^. In the same figure the straight lines indicate where the two expressions are equal. 
Although, this is an inclusive behavior of this approximation and does not prove neces- 
sarily that it holds in every point of the phase space, it is an indicative demonstration of 
its validity. An empirical proof will be obtained in the following chapters by using (|20| ) 
in fits in comparison with the unbinned maximum likelihood technique. 

The p.d.f. ( P^ could be evaluated in bins of the five Optimal Variables of (PPD, 
by means of a M.C integration provided that there is the available statistics of fully 
reconstructed M.C. events. As an example, if one uses 10 bins per Optimal Variable and 
demands an average of 100 M.C. events per bin then a total of 10'' M.C. (!!) reconstructed 
events is needed in order to represent the p.d.f. with a 10% evaluation error. On the other 
hand, the accumulated data samples during the 183 Gev run of LEP II are of the order 
of 200 events (in all the semileptonic channels) per experiment. 

By inverting the argument, one could demand the division of the available M.C. statis- 
tics in so many 5-dimensional bins as the number of the accumulated events. In doing 
so, several semi-analytic kernel techniques |[T^ could be deployed to represent the p.d.f.. 
However, none of them [0] guaranties unbiased results for every application. In the fol- 



lowing we propose a method of distribution representation which, instead of optimizing 
the shape and magnitude of the kernel function, it is using the data points to divide the 
space in equiprobable multidimensional bins. 

Let a sample of Ud selected real events described by the set of Ud Optimal Variable 
vectors Ri = {R},Rf,Rf,Rf,Rf) with i = l,2,...,nd. In parallel, let us assume that 
there are N M.C. events with Optimal Variable vectors fk = {fky''^ky''^h''^ky'^k) where 
A; = 1, 2, . . . , A^. The scalar distance of each of the M.C. events to each of the data points 
is formed as: 

Afc = {Ri - rkf -M-iRi- fk) (21) 



In this distance definition, is a 5x5 matrix representing the metric of the space. The 
j^^ M.C. event is associated to the n*^ datum if Dnj is the minimum of all the D\j, 
A = l,...,nd. The n^^ bin thus corresponds to the cluster of m„ M.C. events being 
associated with the n^^ real event. The cross section A„(ai,a2), its error (T„(ai,a2) and 
their dependence on the coupling values are evaluated by M.C. reweighting by using these 
rrin events. Obviously this association results to an equiprobable division of the space, 
assuming that the best available knowledge of the p.d.f. is that of the real data points 
themselves. The coupling values are then estimated by a maximization of the binned 
extended likelihood function which in this case is defined as: 



(Mi-£-Ai(ai,a2))^ 



^ r e 2.(£.<Ti(ai,Q2))2 

L^ = Y[ ft^e-^^ . -d/x, (22) 

i^iJ V2Tr ■ L ■ ai{ai,a2) 

where C is the available luminosity |. The proposed technique coincides with the 
standard binned analysis only in one dimensional problems when each bin corresponds to 
one real datum. 

The metric matrix in the distance definition (PT]) is used to enhance the importance 



of a variable relatively to another, in exactly the same way as one decides to use more 
bins in one dimension than the other in a standard bin analysis. 



•^Note that eq. ([2^), at the asymptotic hmit {nd,N — > oo), is the unbinned extended Hkehhood 
function. 
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In this analysis a metric matrix with zero non-diagonal elements has been used. The 
diagonal elements have be chosen to be the inverts of the mean squares of the inclusive data 
distributions with respect to each of the Optimal Variables. Such a choice corresponds to 
a standard bin analysis where the same number of equiprobable bins have been used in 
every dimension. In principle the definition of the metric matrix depends on the particular 
problem (e.g. on the information which each variable is carrying and on the possible 
correlations between the variables) and should be chosen by M.C experimentation. 

The accuracy of the proposed technique depends strongly on the number of the as- 
sociated M.C. events to each of the real data. Although this fact is taken into account 
in the extended likelihood function definition (^2|), the proposed procedure breaks down 
when none (or practically very little) of the M.C. events is associated to some data points. 
Obviously such pathologies are easily avoided, even in the case of a limited M.C. statis- 
tical sample, when the p.d.f. used in the M.C. generation is similar to the real events 
kinematical distribution. Alternatively, the data points should be grouped together defin- 
ing thus larger bins (mega-bins) with adequate M.C. contribution. As an example, such 
a grouping will be necessary in situations when a significant number of events will have 
been collected and the use of so many bins is impractical. In this case the goal consists 
in dividing the phase space in (almost equiprobable) mega-bins containing several of the 
accumulated real events. The grouping of the data points should be such that the overall 
variance, within the groups, to be minimum. In other words if one chooses to group the 
Hd data points in g groups then the optimal grouping is the one which minimizes the 
quantity 

g nx 

^ = E E(^A - RlfM{G^ - Rl) (23) 

A=l k=l 

where {k = 1,2, ...nx) are the Optimal Variable vectors of the data points belonging 
to the A*^ group and Gx is the center of the R^ vectors of the A*'* group. 
An iterative way of approximating the optimal grouping is the so called K-means clustering 
p!5| . This is an iterative algorithm where in the zeroth step g arbitrary data points are 
used as centers. The rest of the events are grouped taking into account their scaled 
distance (by the metric matrix) from each of the centers. The centers of each group are 
reevaluated and the data points are redistributed according to their scaled distances to 
the new centers. The procedure is repeated until no more data points are migrating. 
In applying this method, the nx vectors G are used in eq. (pT]) to cluster the M.C. events, 
to define the mega-bins and to evaluate the corresponding cross sections as before. The 
likelihood function is defined as in (|22| ) with the obvious difference that the poissonian 
terms represent the observation of ux (instead of one) events in each of the A = 1, 2, . . . , 
mega-bins. 



This is the vector G\ which minimizes the expression 
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4 Numerical results 



In order to demonstrate the properties of the proposed techniques in fitting finite statis- 
tical samples, a series of M.C experiments has been performed. Fully reconstructed four 
fermion EXCALIBUR events, produced with S.M. coupling values, were mixed with back- 
ground events to form data sets corresponding to the 50.23 pb~^ accumulated luminosity 
by the DELPHI detector W^ at the 183 GeV Run of LEP II. Each of the sets consisted of 



82, 101 and 39 events, in average, with an electron, muon and tau lepton in the final state 
respectively. The average background contribution to each of the above subsets were 8.0 
1.4 and 8.3 events. The specific event multiplicity of each data set was chosen to follow 
poissonian distributions. Another set of fully four fermion and background reconstructed 
events, produced and selected as it is described in Section 1, was used to calculate cross 
sections and probabilities as well as their dependence on the TGC's by reweighted Monte 
Carlo integration. In fitting the data sets the (A^, Agf) and the (Afc^, Agf) TGC schemes 
were used [@|, where a simultaneous estimation of the free couplings was performed. 

The asymptotic property of the log likelihood ratio f\ [|r^ was used in order to demon- 
strate the unbiasedness of the proposed techniques. That is that the (n.d.f.=2) prob- 
ability of obtaining the specific value of A 

A = -2-log%.^ (24) 

in each fit of the data sets should follow an equiprobable distribution. 
Furthermore, the consistency of evaluating correctly the error matrix (S) in each esti- 
mation is checked by using the asymptotic property of the likelihood estimations to be 
gaussian distributed around the true parameter values. Thus for an unbiased estimation 
of central values and for correct error matrix evaluation the quantity S: 

^={t- ) ■ ^ ■ ( - «2 - ) (25) 

should follow a x^(ii-d.f.=2) distribution. This property is demonstrated by presenting 
the x^(n.d.f.=2) probabilities to obtain specific 6 values in fitting the data sets. The above 
tests of A and 6 distributions are extensions of the sampling and pull distribution tests 
respectively, commonly used in one parametric fits. 

Due to the limited number of the available M.C. events, only sixty independent data 
sets could be constructed. Although the number of the data sets is enough to indicate the 
optimal properties of the proposed techniques, the bootstrap procedure ^ [|l^ has been 
used as well to construct a large number of semicorrelated data sets. 

The background contamination of these data sets was taken into account in both the 
estimators (^) and (^) by including the contribution from non signal sources in the 

In an unbiased estimation, the twice of the log ratio of the hkehhood functions (|6|) or ( p2[ ) evaluated 
at couplings equal to the production to the likelihood values corresponding to the estimated couplings 
should follow a distribution for two degrees of freedom 

^ The bootstrap procedure advocates that one can select randomly Af events to form a set from a pool 
of K. available events for a large number of times. The distribution of statistics, evaluated from each of 
the bootstrapped sets, approximates well the true distribution as long as JC is big enough compared to 
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expected number of events. In parallel, the evaluation error of these contributions was 
also included in the convolutions. 

Results of estimating the (A^, A^ff ) and (A/c^, Agf) couplings with the Iterative Op- 
timal Variable technique are shown in figure ^. In both TGC schemes, the optimal 
properties of the technique in estimating central values and error matrices are obvious. 
Specifically the sixty completely uncorrelated samples produce = 2) probabil- 

ities (b,d,f,h) distributed with mean values close to 0.5 and root mean squares close to 
1 / vT2 whilst the equiprobable (corresponding to zero slope when fitted to a first degree 
polynomial) behavior of the ^(^{n.d.f. = 2) probability values obtained by fitting the 
bootstrapped (a,c,e,g) samples is striking. 

In applying the Multidimensional Clustering technique the metric matrix elements 
were evaluated separately for each fit according to the inclusive distributions of each lep- 
tonic final state. Special care has been taken to define the limits on every Optimal Variable 
direction and to avoid artificially large bins at the extrema of the joint distribution. As an 
example in figure ^ the inclusive distributions with respect to the five Optimal Variables 
corresponding to the muonic final states of a single data set are shown. Only those of the 
M.C events which had their coordinates lying between the maxima and minima of the 
observed Optimal Variables (extended by the one tenth of the root mean square value) 
were taken into account in the cluster definition. 

Results obtained with the Multidimensional Clustering technique are shown in figure 
P where the consistent behavior of these estimations is apparent. In these clustering 
experiments, each of the multidimensional bins was occupied by a single datum employing 
thus 240 bins per average. 

The behavior of the A and S quantities are further used to quantify the sensitivity 
of the proposed techniques. Indeed such properties fl^ ensure that the estimated values 
{di,d2} follow a two dimensional gaussian distribution with a covariant matrix which 
characterizes the average sensitivity in estimating the couplings. The covariant matrix 
elements for both the techniques (i.e. the variances and correlations of the couplings 
estimations) are found by fitting a 2-dim gaussian to the estimated coupling values from 
the 60 independent sets. These average sensitivities are summarized in Tables 1 and 2 for 
the {X^,Ag^) and {Ak^, Agf) estimations. 

The same uncorrelated M.C. sets of events were treated as if they have been collected by a 
"perfect" detector and the two pair of couplings were estimated by an unbinned extended 
likelihood fit as well as by the Clustering and the Iterative Optimal Variable technique^. 
The average sensitivities obtained from these estimations ("perfect" extended unbinned 
likelihood, "perfect" Iterative Optimal Variables and "perfect" Clustering technique) are 
also shown for comparison in Tables 1 and 2 where the equivalence of the proposed 
methods to the likelihood fits is obvious. The loss of sensitivity in the case of a realistic 
detector is a natural consequence of the loss of information due to the imperfect measuring 
resolution. However the consistent inclusion of the detector effects in the realistic case 
guaranties consistent central value and confidence interval estimation. It is also worth 
noticing that for both the proposed methods (in the realistic case), the evaluated errors 
and correlations in every individual estimation are gaussian distributed with means very 
close to the average sensitivities, as it is shown in figure |^ and figure |^. 

^ The true kinematical vector V of each event of the data set was used to calculate the matrix element 
and the probability content of each bin respectively. In the following when an ideal detector is assumed 
the method and the results will be characterized as "" perfect" " . 
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The proposed Multidimensional Clustering technique is a general purpose procedure 
which can be used in any binned fit provided that the metric matrix is properly defined. As 
a demonstration, the properties of the estimations of a single coupling (Sgl) are shown in 
figure y and figure |10[ These are the results of two dimensional binned extended likelihood 
fits, using either the Optimal Variables or the angular distributions of the hadronic and 
leptonic part of the event ^ |TT| . Results (mean of the sampling distributions, mean and 
sigmas of the pull distributions and expected errors ) concerning the other couplings can 
be found in Table 3 in comparison to the results which could be obtained by a "perfect" 
unbinned extended likelihood. 

Finally the extension of the Multidimensional Clustering technique involving grouping 
of the data points was applied to a data set of 6000 events. The data were divided in 
64 groups by the K-means clustering algorithm and the mega-bins were defined by the 
centers of the data clusters. Results of this method in estimating the (A^, Agf) couplings 
are shown in figure |ll| in comparison with the results of the Iterative Optimal Variable 
technique (employing the same number of bins) and the "perfect" unbinned extended 
likelihood fit. 



5 Conclusions 

In this paper the Optimal Variable technique 0] was generalized in order to be applied 
for a simultaneous estimation of two couplings using the appropriate TGC model 0]. 
Two generalization schemes were proposed; one Iterative 2-dimensional procedure which 
is based on expanding the p.d.f. in a Taylor series and another which is a method of 
representing the p.d.f. in five dimensions using the real data-points as seeds. The latter 
is a novel kernel-type algorithm which can be used for any number of real events. Both 
the techniques were demonstrated to be asymptotically consistent estimators, including 
the detector effects and the background contribution. 

The properties of the techniques when fitting finite size event samples were investi- 
gated by M.C. experimentation. Sets of M.C. events, of the same size as the data samples 
accumulated by each of the LEP experiments at the 183 GeV run, were fitted by both 
the proposed methods to estimate the {A^, Agf} and {Ak^, A^f^} couplings. The distri- 
butions of these estimations support the optimal behavior (unbiasedness, consistent error 
matrix evaluation) of the techniques. Moreover a comparison with the unbinned extended 
likelihood results demonstrates that the Iterative Optimal Variable and Multidimensional 
Clustering estimators are practically reaching the maximum sensitivity, as it is shown in 
Tables 1 and 2. A deterioration of their sensitivity (up to 20%) when dealing with realistic 
detectors is due to the imperfect resolution of the measuring apparatus. 



A comparison ||I2[ between the sensitivity of several multiparametric TGC estimators, 
which include detector effects, shows that the proposed techniques are equivalent to the 
Modified Observables technique whilst outperform classical methods of one or two 



dimensional binned likelihood fits |]r2| . 

Finally, in Table 3 the expected sensitivities of the Clustering technique when applied 
to single coupling estimations are summarized. This method is a general purpose pro- 
cedure of representing any projection of the probability distribution functions. In this 
study the Agf, A^ and Ak^ couplings were estimated by using projections of the p.d.f. 
(0) to the Optimal Variable plane and to the plane defined by the cosines of the polar 
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angles of the hadronic system and the charged lepton {cosQw,cosQi). Naturally, the 
estimations corresponding to the Optimal Variable choice are more sensitive to that of 
the angular distributions due to the information content of the projected p.d.f.. These 
results of the 2-dimensional fits with the Clustering procedure are completely equivalent 
to the results obtained [12| by the standard binned analysis when using the same p.d.f. 
projections. 
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At - A^f 








P 


"Perfect" Extended Likelihood 


0.21 ± 0.01 


0.20 ± 0.01 


-0.73 ± 0.06 


Clustering (" Perfect" ) 


0.21 ± 0.01 


0.20 ± 0.01 


-0.74 ± 0.06 


Iterative estimations ("Perfect" ) 


0.22 ± 0.01 


0.21 ± 0.01 


-0.74 ± 0.06 


Clustering 


0.23 ± 0.01 


0.22 ± 0.01 


-0.74 ± 0.06 


Iterative estimations 


0.24 ± 0.01 


0.23 ± 0.01 


-0.72 ± 0.06 



Table 1: Comparison of the statistical properties of the techniques proposed in this paper 
with the unbinned extended likelihood estimations of the A-y — Agf couplings. 













P 


"Perfect" Extended Likelihood 


0.35 ± 0.03 


0.14 ± 0.01 


-0.22 ± 0.08 


Clustering ("Perfect" ) 


0.35 ± 0.03 


0.15 ± 0.01 


-0.21 ± 0.09 


Iterative estimations ("Perfect" ) 


0.36 ± 0.03 


0.15 ± 0.01 


-0.24 ± 0.09 


Clustering 


0.41 ± 0.03 


0.15 ± 0.01 


-0.23 ± 0.10 


Iterative estimations 


0.43 ± 0.03 


0.16 ± 0.01 


-0.27 ± 0.10 



Table 2: Comparison of the statistical properties of the techniques proposed in this paper 
with the unbinned extended likelihood estimations of the Ak^ — Ag^ couplings. 





A^f 


A^ 


AKy 


"Perfect" Extended Likelihood 








mean of estimations 


0.006 ± 0.020 


-0.006 ± 0.02 


0.004 ± 0.06 


estimation accuracy 


0.14 ± 0.01 


0.16 ± 0.02 


0.49 ± 0.05 


Clustering technique (Opt. Var.) 








mean of estimations 


0.00 ± 0.03 


-0.01 ± 0.02 


-0.02 ± 0.09 


estimation accuracy 


0.16 ± 0.02 


0.16 ± 0.02 


0.47± 0.07 


pull sigma 


1.07 ± 0.14 


0.95 ± 0.11 


1.13 ± 0.15 


Clustering technique (cosOw , cos9i) 








mean of estimations 


-0.01 ± 0.03 


-0.01 ± 0.02 


-0.01 ± 0.08 


estimation accuracy 


0.18 ± 0.02 


0.19 ± 0.02 


0.56± 0.05 


pull sigma 


1.05 ± 0.11 


1.1 ± 0.11 


1.13 ± 0.11 



Table 3: Comparison of the statistical properties of the Clustering procedure proposed in 
this paper with the single unbinned extended likelihood estimations of the Ag^, A-y and 
An-y. 
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Figure 1: Deviations of the estimated coupling values {ai — X^, a2 — A^f) from the 
expansion values. 

a) (A-y — A°) as a function of {X'^,Agf }, 

b) the intersection of (a) with the plane corresponding to zero deviation, 

c) {Agl - Agf) as a function of {A°, Agf}, 

d) the intersection of (c) with the plane corresponding to zero deviation. 

The fitted events have been produced with S.M. couplings. The stars at (b) and (d) 
indicate the point where both the deviations are zero. 
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Figure 2: Deviations of the estimated coupling values {oii — A^, 0:2 = A^f) from the 
expansion values. The fitted events in [a,b,c,d] have been produced with {A-y = — 1, Ag^ — 
0} whilst [c,f,g,h] correspond to events produced with {A-y = 1 ,Agl — 0}. 

[a,e] (A^ — A°) as a function of {A^J , Agf}, 

[b,f] the intersection of [a,e] with the plane corresponding to zero deviation respectively, 
[c,g] {Agl — Agf ) as a function of {A°, Agf }, 

[d,h] The intersection of [c,g] with the plane corresponding to zero deviation respectively. 
The stars to [b,d] and [f,h] indicate the point where the estimated couplings are equal to 
the expansion values. 
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Figure 3: The mean values of the functions a)^^, b)^^!^! c)22^, d)222gl and e)^^, 

^ ' coo(V)^ ' coo{V)' ' coo{V)^ ' coo(V) ' coo(V) 

evaluated using M.C. events produced with zero couplings and have been reconstructed 
with kinematical vectors O corresponding to a bin of a)^^, b)22iigi, c)'^^, d)^^ 

and e) respectively. 
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Figure 4: The distributions of (n.d.f.=2) probabilities in obtaining A [a,b,e,f] and S 
[c,d,g,h] values in {X^^Agf} [a,b,c,d] and {Anj^Ag^} [e,f,g,li] estimations by the Iterative 
Optimal Variable technique. The lines with slopes consistent with zero in [a,c,e,g] are 
first degree polynomial fits to the bootstrap results. 
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Figure 5: The inclusive event distributions with respect to the five Optimal Variables.The 
events have been produced with Standard Model couphngs. 



19 



(o) 



slope = 5.6! 9.1 



0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 I 

probability 

slope =2,7 ! 9.2 



0,1 0.2 0.3 0.4 0.5 0,6 0,7 0.8 0.9 1 

probobillty 



M 30 
I 25 
20 
15 
10 
5 


» 30 
I 26 
20 
15 
10 
5 




(b) 



meon = 0,51 '. 0.04 
RMS =0,29! 0,03 




0,1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

)^ probobility 



(d) 



mean = 0,48 ! 0,04 
RMS =0,28! 0,02 




0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 

probobility 



(e) 



slope = -1.6! 9.0 



0,1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 

)f probobility 



(9) 



slope =2.3 ! 9.2 



0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 

^ probobility 



: (0 


mean = 0.48 ! 0.04 


RMS = 0.29 ! 0,03 




) 0,1 0,2 0,3 


0.4 0,5 0,6 0,7 0.8 0.9 

X* probobility 


' (h) 


mean = 0,50 ! 0,04 


RMS = 0,28 ! 0,02 





0,1 0,2 0,3 0,4 0,5 0,6 0,7 0,8 0,9 1 

^ probobility 



Figure 6: The distributions of (n.d.f.=2) probabilities in obtaining A [a.b.c.f] and 8 
[c,d,g,h] values in {A^,A(7j^} [a,b,c,d] and {A/t^,A(?J'} [e,f,g,h] estimations by the Clus- 
tering technique. The lines with slopes consistent with zero in [a,c,e, g] are first degree 
polynomial fits to the bootstrap results. 
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Figure 7: Confidence interval estimations with the Iterative Optimal Variable technique. 
The distributions of errors [a,b,d,e] and correlations [c,f] in estimating the {A^, Agf} 
[a,b,c] and {Ak^, Agf} [d,e,f] pair of couplings. The data points correspond to the 60 
independent data sets whilst the histograms to the bootstrap results. The arrows indicate 
the average sensitivities summarized in Table 1 and 2. 
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Figure 8: Confidence interval estimations with the Chistcring technique. The distributions 
of errors [a,b,d,e] and correlations [c,f] in estimating the {A^, Agf} [a,b,c] and {Ak^, Ag^} 
[d,e,f] pair of couplings. The data points correspond to the 60 independent data sets whilst 
the histograms to the bootstrap results. The arrows indicate the average sensitivities 
summarized in Table 1 and 2. 
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Figure 9: The sampling, pull and error distribution of /S.gl estimation by a binned Optimal 
Variable fit using the proposed in this paper Clustering procedure. 
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Figure 10: The sampling, pull and error distribution od Agif estimation by a binned 
{cosQw, cosQi} fit using the proposed in this paper Clustering procedure. 
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Figure 11: Comparison of confidence intervals in the {X,Agl) plane obtained by : 
an Iterative Optimal Variable, b) Multidimensional Clustering using mega-bins and 
Unbinned extended likelihood (assuming a "perfect" detector) estimation of the coupling 
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